adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

نویسندگان

Nitish Shirish Keskar

Albert S. Berahas

چکیده

Recurrent Neural Networks (RNNs) are powerful models that achieve unparalleled performance on several pattern recognition problems. However, training of RNNs is a computationally difficult task owing to the well-known “vanishing/exploding” gradient problems. In recent years, several algorithms have been proposed for training RNNs. These algorithms either: exploit no (or limited) curvature information and have cheap per-iteration complexity; or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAM and ADAGRAD while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present an novel stochastic quasi-Newton algorithm (ADAQN) for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method is judicious in storing and retaining L-BFGS curvature pairs which is indirectly used as a means of controlling the quality of the steps. We present numerical experiments on two language modeling tasks and show that ADAQN performs at par, if not better, than popular RNN training algorithms. These results suggest that quasi-Newton algorithms have the potential to be a viable alternative to firstand second-order methods for training RNNs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An adaptive quasi-Newton algorithm for eigensubspace estimation

In this paper, we derive and discuss a new adaptive quasi-Newton eigen-estimation algorithm and compare it with the RLS-type adaptive algorithms and the quasi-Newton algorithm proposed by Mathew et al. through experiments with stationary and nonstationary data.

متن کامل

A class of multi-agent discrete hybrid non linearizable systems: Optimal controller design based on quasi-Newton algorithm for a class of sign-undefinite hessian cost functions

In the present paper, a class of hybrid, nonlinear and non linearizable dynamic systems is considered. The noted dynamic system is generalized to a multi-agent configuration. The interaction of agents is presented based on graph theory and finally, an interaction tensor defines the multi-agent system in leader-follower consensus in order to design a desirable controller for the noted system. A...

متن کامل

CSLMEN: A New Optimized Method for Training Levenberg Marquardt Elman Network Based Cuckoo Search Algorithm

RNNs have local feedback loops within the network which allows them to shop earlier accessible patterns. This network can be educated with gradient descent back propagation and optimization technique such as second-order methods; conjugate gradient, quasi-Newton, Levenberg-Marquardt have also been used for networks training [14, 15]. But still this algorithm is not definite to find the global m...

متن کامل

Quasi-Newton Methods for Nonconvex Constrained Multiobjective Optimization

Here, a quasi-Newton algorithm for constrained multiobjective optimization is proposed. Under suitable assumptions, global convergence of the algorithm is established.

متن کامل

A limited memory adaptive trust-region approach for large-scale unconstrained optimization

This study concerns with a trust-region-based method for solving unconstrained optimization problems. The approach takes the advantages of the compact limited memory BFGS updating formula together with an appropriate adaptive radius strategy. In our approach, the adaptive technique leads us to decrease the number of subproblems solving, while utilizing the structure of limited memory quasi-Newt...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs

نویسندگان

چکیده

منابع مشابه

An adaptive quasi-Newton algorithm for eigensubspace estimation

A class of multi-agent discrete hybrid non linearizable systems: Optimal controller design based on quasi-Newton algorithm for a class of sign-undefinite hessian cost functions

CSLMEN: A New Optimized Method for Training Levenberg Marquardt Elman Network Based Cuckoo Search Algorithm

Quasi-Newton Methods for Nonconvex Constrained Multiobjective Optimization

A limited memory adaptive trust-region approach for large-scale unconstrained optimization

عنوان ژورنال:

اشتراک گذاری